Polymorphisms of CYP7A1 and HADHB Genes and Their Effects on Milk Production Traits in Chinese Holstein Cows

Simple Summary This study aimed to identify genetic variants related to differences in milk production in Chinese Holstein cows. By analyzing specific genes involved in lipid metabolism, researchers identified variants linked to the milk, fat, and protein yield cows produce over time. They also found patterns in these genetic variations that could affect how the genes work together. The results of this study suggest that these genetic markers could be useful for selecting cows with better milk-production traits. Additionally, the findings could help scientists better understand how these genes influence milk production, potentially leading to improved strategies for breeding dairy cattle. Overall, this research provides valuable insights into enhancing milk production in dairy farming, which could ultimately benefit both farmers and consumers. Abstract Our preliminary research proposed the cytochrome P450 family 7 subfamily A member 1 (CYP7A1) and hydroxyacyl-coenzyme A dehydrogenase trifunctional multienzyme complex beta subunit (HADHB) genes as candidates for association with milk-production traits in dairy cattle because of their differential expression across different lactation stages in the liver tissues of Chinese Holstein cows and their potential roles in lipid metabolism. Hence, we identified single-nucleotide polymorphisms (SNPs) of the CYP7A1 and HADHB genes and validated their genetic effects on milk-production traits in a Chinese Holstein population with the goal of providing valuable genetic markers for genomic selection (GS) in dairy cattle, This study identified five SNPs, 14:g.24676921A>G, 14:g.24676224G>A, 14:g.24675708G>T, 14:g.24665961C>T, and 14:g.24664026A>G, in the CYP7A1 gene and three SNPs, 11:g.73256269T>C, 11:g.73256227A>C, and 11:g.73242290C>T, in HADHB. The single-SNP association analysis revealed significant associations (p value ≤ 0.0461) between the eight SNPs of CYP7A1 and HADHB genes and 305-day milk, fat and protein yields. Additionally, using Haploview 4.2, we found that the five SNPs of CYP7A1 formed two haplotype blocks and that the two SNPs of HADHB formed one haplotype block; notably, all three haplotype blocks were also significantly associated with milk, fat and protein yields (p value ≤ 0.0315). Further prediction of transcription factor binding sites (TFBSs) based on Jaspar software (version 2023) showed that the 14:g.24676921A>G, 14:g.24675708G>T, 11:g.73256269T>C, and 11:g.73256227A>C SNPs could alter the 5′ terminal TFBS of the CYP7A1 and HADHB genes. The 14:g.24665961C>T SNP caused changes in the structural stability of the mRNA for the CYP7A1 gene. These alterations have the potential to influence gene expression and, consequently, the phenotype associated with milk-production traits. In summary, we have confirmed the genetic effects of CYP7A1 and HADHB genes on milk-production traits in dairy cattle and identified potential functional mutations that we suggest could be used for GS of dairy cattle and in-depth mechanistic studies of animals.


Introduction
Milk, often referred to as "white gold", plays a pivotal role as a rich source of essential nutrients supporting growth, bone-mass accrual, and immune function, containing proteins, fats, carbohydrates, minerals, and vitamins [1].The surge in demand for high-quality dairy products is attributed to factors like global population growth, increasing income levels, and shifting dietary preferences [2].This change in demand is particularly pronounced in developing regions, notably in Asia and Africa.Enhancing yield and quality traits represent paramount objectives for dairy breeding, necessitating the implementation of effective breeding strategies such as genomic selection (GS) to optimize these characteristics.
GS facilitates early and precise selection of young bulls and reserve heifers without relying on phenotypic data, thereby shortening the dairy breeding cycle from 5-6 years to about 2 years.By significantly reducing generation intervals, this approach lowers breeding costs and accelerates genetic progress.Therefore, improved accuracy in GS is crucial for the development of improved breeding strategies [3].Single-nucleotide polymorphisms (SNPs), which are prevalent in genomic regions like promoters or enhancers, serve as primary genetic markers.They impact gene expression by altering how regulatory proteins bind or how chromatin is structured, thus affecting protein structure, RNA splicing [4], and miRNA binding [5].They are crucial in influencing pathways associated with milk synthesis and lactation [6][7][8][9].Integrating functional SNPs into genomic-selection chips can enhance the efficiency, accuracy, and sustainability of selection [10], leading to improved milk-production traits in cows.
The rapid development of -omics technology has facilitated the process of functionalgene/locus mining.Our preliminary research proposed that the cytochrome P450 family 7 subfamily A member 1 (CYP7A1) and hydroxyacyl-coenzyme A dehydrogenase trifunctional multienzyme complex beta subunit (HADHB) genes are differentially expressed in different lactation stages in dairy cows and are involved in lipid metabolism [11].The CYP7A1 gene encodes cholesterol 7 alpha-hydroxylase, the rate-limiting enzyme in the classic pathway of bile-acid synthesis in the liver [12].Bile acids are integral to processes such as lipid digestion, absorption, and excretion, as well as to the regulation of cholesterol homeostasis and metabolic pathways.The HADHB gene encodes hydroxyacyl-CoA dehydrogenase, a vital component of the mitochondrial trifunctional protein (MTP) complex, which is responsible for catalyzing the final three steps of mitochondrial beta-oxidation of long-chain fatty acids [13].Furthermore, the CYP7A1 gene is positioned within the quantitative trait loci (QTL) QTL_ID: 2732 and QTL_ID: 3408, which have been demonstrated to affect milkfat yield and percentage and milk protein percentage [14][15][16].HADHB is situated within the QTL region related to milk yield (QTL_ID: 1512) and is approximately 11.8 kb from the SNP rs110711742, which is associated with milk yield [17].These findings suggest a potential association between CYP7A1 and HADHB and milk-production traits.
This study investigates the genetic associations between candidate genes CYP7A1 and HADHB and milk-production traits in dairy cattle, including 305-day milk yield, fat yield, fat percentage, protein yield, and protein percentage.Additionally, predictive analyses were used to assess the regulatory effects of SNPs, particularly the effects on transcription factor binding sites (TFBSs) and mRNA stability, offering insights to support further exploration of causal mutations and their potential applications to developing GS chips for the breeding of dairy cattle.

Animal Selection and Phenotypic Data Collection
We selected 898 dairy cows from 45 Chinese Holstein sire families on 22 farms in Beijing Sunlon Livestock Development Co., Ltd.(Beijing, China) as the experimental population.The 45 bulls were used for SNP identification, and 898 cows were used for association analysis (898 cows in the first lactation and 611 in the second lactation).Each sire family had an average of 21 daughters, and each cow had three generations of pedigree information and Dairy Herd Improvement (DHI) records that were provided by the Beijing Dairy Cattle Centre (Beijing, China) (Table S1).The cows in each sire family were distributed across various dairy farms and were maintained with the same feeding conditions.Data consisted of the milk-production phenotype of each cow for the whole lactation period of the parity, which comprised 305-day milk yield, fat yield, fat percentage, protein yield, and protein percentage.The study was conducted in accordance with Guide for the Care and Use of Laboratory Animals and was approved by the Institutional Animal Care and Use Committee (IACUC) at China Agricultural University (Beijing, China).

Genomic DNA Extraction
The frozen semen of 45 bulls and blood samples from 898 cows were provided by Beijing Dairy Cattle Center (Beijing, China).We extracted genomic DNA from the semen using the salting-out procedure and used a TIANamp Blood DNA Kit (Tiangen, Beijing, China) to extract DNA from the blood samples.The quantity and quality of extracted DNA were determined by a NanoDrop2000 spectrophotometer (Thermo Science, Hudson, NH, USA) and gel electrophoresis (1.5%), respectively.

Identification and Genotyping of SNPs in Candidate Genes
According to the genomic sequences of the two genes, CYP7A1 (NC_037341.1) and HADHB (NC_037338.1) of Bos taurus in Genbank, primers were designed using Primer3.0 (https://bioinfo.ut.ee/primer3-0.4.0/ (accessed on 8 September 2021)).They were then synthesized by Beijing Liuhe Bgi Co., Ltd.(Beijing, China).The primers covered the entire coding region and 2000 base pairs upstream and downstream of the regulatory regions of the genes.The DNA samples from semen were diluted to 50 ng/µL and used for PCR amplification (Table S2).PCR products were analyzed by 2% gel electrophoresis, and the qualified products were sequenced bidirectionally by Beijing Qingke Xinye Biotechnology Co., Ltd.(Beijing, China).Then, the sequences were aligned to the reference sequence (ARS-UCD1.2) using NCBI-blast (https://blast.ncbi.nlm.nih.gov/Blast.cgi(accessed on 20 November 2023)) to identify SNPs.Subsequently, we genotyped the SNPs in the 898 cows using Genotyping by Target Sequencing (GBTS) technology by Boruidi Biotechnology Co., Ltd.(Shijiazhuang, China).The allelic and genotypic frequencies were calculated, and the Hardy−Weinberg equilibrium was tested by the chi-squared test.

Linkage Disequilibrium (LD) Estimation
The extent of LD between the identified SNPs was estimated using Haploview 4.2 (Broad Institute of MIT and Harvard, Cambridge, MA, USA), with the solid spine algorithm.The haplotype block with a frequency greater than 0.05 was retained.The extent of LD is measured by the D ′ value, to which it is proportional.

Association Analysis between SNP/Haplotype and Milk-Production Traits
The MIXED procedure in SAS 9.4 software (SAS Institute Inc., Cary, NC, USA) was used to perform association analysis between the SNP or haplotype block and the five milkproduction traits (305-day milk yield, fat yield, fat percentage, protein yield and protein percentage) using the following animal model: where y is the phenotypic value of each trait of each cow; µ is the overall mean; HYS is the fixed effect of farm (1-22: 22 farms), calving year (1-4: 2012-2015) and calving season (1: April-May; 2: June-August; 3: September-November and 4: December-March); M is the age at calving as a covariant, b is the regression coefficient of covariant M; G is the genotype or haplotype combination effect; a is the individual random additive genetic effect with a distribution of N 0, Iδ 2 e ; A is a pedigree-based relationship matrix with an additive genetic variance of δ 2 a ; and e is the random residual with a distribution of N 0, Iδ 2 e , where I is the unit matrix and δ 2 e is the residual variance.A Bonferroni correction was carried out by multiple tests with a significance level equal to the original p value divided by the number of genotype or haplotype combinations.Furthermore, the additive, dominant, and allelic substitution effects of the SNPs were calculated by the following formulas: a = (X AA − X BB )/2; d = X AB − (X AA + X BB )/2; (1) where a is the additive effect; d is the dominant effect; α is the allelic substitution effect; X AA , X AB and X BB are the least-squares means of milk-production traits for the corresponding genotypes; p is the frequency of allele A; and q is the frequency of allele B.

Prediction of Transcription Factor Binding Sites
We used Jaspar (http://jaspar.genereg.net/(version 2023)) software to predict whether SNPs in the 5 ′ flanking region of CYP7A1 and HADHB genes changed the transcription factor binding sites (TFBS) (relative score ≥ 0.80).

Prediction of mRNA Structure
We used the NCBI database (https://blast.ncbi.nlm.nih.gov/(accessed on 20 November 2023)) to query for synonymy or missense mutations on exons.Subsequently, we used RNAfold WebServer (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi(accessed on 20 November 2023)) to predict the changes to the RNA secondary structure caused by the mutation in the untranslated region, with the parameters of minimum free energy (MFE), partition function, and avoidance of isolated base pairs.MFE was used for a direct comparison of the folding stability of RNAs of the same sizes; the smaller the MFE, the greater the stability.

Single-Marker Association Analysis
The results of the association analysis (Table 2) showed that 14:g.24665961C>T, in CYP7A1, was significantly associated with milk yield (p value = 0.0443) and fat yield (p value ≤ 0.0001) in the first lactation period; four SNPs, 14:g.24676921A>G,14:g.24676224G>A,14:g.24675708G>T, and 14:g.24665961C>T, were significantly associated with milk, fat and protein yields (p value ≤ 0.0059) and 14:g.24664026A>G was highly significantly associated with fat yield (p value = 0.0037) in the second lactation.
In HADHB, all three SNPs, 11:g.73256269T>C,11:g.73256227A>C, and 11:g.73242290C>T, were significantly associated with milk and protein yields in the first lactation (p value ≤ 0.0461) and with milk, fat, and protein yields in the second lactation (p value ≤ 0.001).Additionally, the results of the additive, dominant, and allelic substitution effects were shown in Table 3.

Haplotype Association Analysis
Using Haploview 4.2, five SNPs in CYP7A1, 14:g.24675708G>T,14:g.24676224G>A,14:g.24676921A>G,14:g.24665961C>T and 14:g.24664026A>G,formed two haplotype blocks (Figure 1).In Block 1, the frequencies of haplotypes H1 (AC), H2 (GT) and H3 (AT) were 0.463, 0.342, and 0.194, respectively.Block 2 was composed of three haplotypes, H1 (TGA), H2 (GAG), and H3 (TGG), with frequencies of 0.349, 0.545, and 0.105, respectively.We found that the two blocks in CYP7A1 were significantly associated with milk, fat, and protein yields in both lactations (p value ≤ 0.0011; Table 4).The number in the table represents the mean ± standard deviation; the number in the bracket represents the number of cows with the corresponding genotype; the p value shows the significance calculated for the genetic effects of SNPs; the superscript letters indicate the significance calculated for the effects different genotypes via the comparison of each pair of pairs; a , b , c within the same column with different superscripts means that the p value is < 0.05; and A , B , C within the same column with different superscripts means that the p value is <0.01.Similarly, two SNPs, 11:g.73256269T>C and 11:g.73256227A>C, in HADHB formed Block 3 (Figure 1).The frequencies of haplotypes H1 (AT), H2 (CC), and H3 (CT) were 0.159, 0.806, and 0.035, respectively.This block was found to be associated with milk, fat, and protein yields in the first lactation (p value ≤ 0.0315), and milk, fat, and protein yields in the second lactation (p value ≤ 0.0052; Table 4).

Changes of Transcription Factor Binding Sites Caused by SNPs in 5 ′ Region
Using the JASPAR, changes in TFBS were predicted for four SNPs in the 5 ′ regulatory region of the CYP7A1 and HADHB genes.As a result, it was found that, in CYP7A1, the allele G of 14:g.24676921A>Gcreated a binding site (BS) for transcription factors (TFs) EBF1 and SP1, while the mutation from G to T of 14:g.24675708G>Teliminated the BS for ELK1 but created a BS for FOXC1 and ZNF354C.For HADHB, before and after mutation, 11:g.73256269T>Cresulted in the disappearance of the BS for PAX2 and NR2F1 and the appearance of that for TFAP2A; in addition, the allele A of 11:g.73256227A>Ccreated a BS for ZNF354C and allele C created a BS for HINFP (Table 5).

mRNA Structural Stability Altered by the Mutation in Untranslated Region
Using RNAfold, changes in the mRNA minimum free energy (MFE) were predicted.The mutation of C to T in SNP 14:g.24665961C>Tcauses the MFE of mRNA to change from −578.10 kcal/mol to −539.20 kcal/mol, resulting in a decrease in the structural stability of CYP7A1 mRNA.

Discussion
The CYP7A1 and HADHB genes have been implicated in milk-production traits in dairy cows, as indicated by recent research studies.CYP7A1 is known to influence triglyceride and cholesterol metabolism in the liver tissue of dairy cows and is a rate-limiting enzyme for bile-acid synthesis [18]; in chickens, HADHB plays a crucial role in liver lipid metabolism, particularly in inducing peroxisomal and mitochondrial β-oxidation activity [19].These findings affirm the possible beneficial influence of these genes on milk production.Here, further identification of SNPs in the CYP7A1 and HADHB genes and analysis of their association with milk-production traits in dairy cows provide valuable information for understanding the mechanism of inheritance of these traits and linking the metabolic functions of these genes to their phenotypic effects.Furthermore, the integration of significant genetic sites from the CYP7A1 and HADHB genes into GS models can facilitate more precise selection of individuals with favorable milk-production traits.The purpose of these models is to predict the genetic advantage of the trait of interest based on the SNP profile of the individual, such that breeders can screen individuals for specific genetic variants associated with particular milk-production phenotypes, select them for breeding, and thus improve the efficiency of breeding.
In this study, we identified SNPs in the CYP7A1 and HADHB genes and confirmed their significant associations with milk-production traits in dairy cows.SNPs can cause phenotypic variation by affecting the function or expression of genes involved in various biological processes [20][21][22][23].This study did find significant differences in the milk-production phenotypes of cows with different SNP sites in the CYP7A1 and HADHB genes.The number of individual cows with lower phenotypic values was also smaller, possibly because individuals of genotypes with higher-production phenotypes had been selected for while those with lower-production phenotypes had been eliminated during artificial breeding over the long term [24].In addition, the phenotypic values of the cows and the significance of SNP/haplotype block associations were lower in the first lactation than in the second lactation.This difference could be explained by either of two possible causes.First, the varying numbers of cows in the two lactation periods may impact the statistical significance of the corresponding results.Second, there are physiological differences between the two lactation periods, as dairy cows tend to produce more milk in their second lactation [25].
Transcription factors are proteins that bind to specific DNA sequences and modulate the expression of target genes involved in various biological processes [26].In this study, we found changes in TFBSs caused by SNPs in the 5 ′ flanking regions of the CYP7A1 and HADHB genes.These changes may result in altered regulation of gene expression by TFs, affecting the participation of the corresponding expression products in normal physiological and metabolic activities and thus, in turn, contributes to changes in milk production.For instance, in HADHB, TFAP2A is a specific nuclear transcription factor that may promote cell mean ± standard deviation; the number in the bracket represents the number of cows with the corresponding genotype; the p value shows the significance for the genetic effects of SNPs; the superscript letters indicate the significance calculated for the effects of different genotypes via the comparison of each pair of pairs; a, b within the same column with different superscripts indicates p < 0.05; and A, B, C within the same column with different superscripts indicates p < 0.01.

Figure 1 .
Figure 1.Linkage disequilibrium estimated between SNPs in CYP7A1 and HADHB gene.The blocks indicate haplotype blocks, and the text above the horizontal numbers is the SNP names.The values in the red boxes are pair-wise SNP correlations (D′, while bright red boxes without numbers indicate complete linkage disequilibrium (LD) (D′ = 1)).

Figure 1 .
Figure 1.Linkage disequilibrium estimated between SNPs in CYP7A1 and HADHB gene.The blocks indicate haplotype blocks, and the text above the horizontal numbers is the SNP names.The values in the red boxes are pair-wise SNP correlations (D ′ , while bright red boxes without numbers indicate complete linkage disequilibrium (LD) (D ′ = 1)).

Table 1 .
Detailed information about the identified SNPs and their genotypic and allelic frequencies.

Table 4 .
Associations of haplotype blocks in the CYP7A1 and HADHB genes with milk yield and milk-composition traits in Chinese Holstein cattle during first and second lactations.
The number in the table represents the

Table 2 .
Associations of eight SNPs in the CYP7A1 and HADHB genes with milk yield and milkcomposition traits in Chinese Holstein cattle during the first and second lactations.

Table 3 .
Additive, dominant and allele substitution effects of 8 SNPs in the CYP7A1 and HADHB genes on milk yield and milk-composition traits in Chinese Holstein cattle during two lactations.

Table 4 .
Associations of haplotype blocks in the CYP7A1 and HADHB genes with milk yield and milk-composition traits in Chinese Holstein cattle during first and second lactations.The number in the table represents the mean ± standard deviation; the number in the bracket represents the number of cows with the corresponding genotype; the p value shows the significance for the genetic effects of SNPs; the superscript letters indicate the significance calculated for the effects of different genotypes via the comparison of each pair of pairs; a , b within the same column with different superscripts indicates p < 0.05; and A , B , C within the same column with different superscripts indicates p < 0.01.

Table 5 .
Transcription factor binding sites (TFBSs) prediction for CYP7A1 and HADHB genes.scores represent the correlation score for each predicted binding site, with higher values indicating a stronger possibility of binding.The SNP in the predicted binding-site sequence is underlined. Relative